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Abstract — In this paper we propose a new wavelet transform 
applicable to functions defined on graphs, high dimensional data 
and networks. The proposed method generalizes the Haar-like 
transform proposed in (J), and it is defined via a hierarchical 
tree, which is assumed to capture the geometry and structure 
of the input data. It is applied to the data using a modified 
version of the common ID wavelet filtering and decimation 
scheme, which can employ different wavelet filters. In each level 
of this wavelet decomposition scheme, a permutation derived 
from the tree is applied to the approximation coefficients, before 
they are filtered. We propose a tree construction method that 
results in an efficient representation of the input function in the 
transform domain. We show that the proposed transform is more 
efficient than both the one-dimensional (ID) and two-dimensional 
(2D) separable wavelet transforms in representing images. We 
also explore the application of the proposed transform to image 
denoising, and show that combined with a subimage averaging 
scheme, it achieves denoising results which are similar to those 
obtained with the K-SVD algorithm. 

Index Terms — Wavelet transform, Hierarchical trees, Efficient 
signal representation, Image denoising. 



I. Introduction 

Most traditional signal processing methods are designed 
for data defined on regular Euclidean grids. Development 
of comparable methods capable of handling non-uniformly 
sampled signals, data defined on graphs or "point clouds", 
is important and very much needed. Many signal processing 
problems involve inference of an unknown scalar target func- 
tion defined on such data. For example, function denoising 
involves estimating such a scalar function from its noisy ver- 
sion. A different example is image inpainting which involves 
estimating missing pixels of a column stacked version of an 
image from its known pixels. A major challenge in processing 
functions on topologically complicated data, is to find efficient 
methods to represent and learn them. 

Many signal processing techniques are based on transform 
methods, which represent the input data in a new basis, before 
analyzing or processing it. One of the most successful types 
of transforms, which has been proven to be a very useful 
tool for signal and image processing, is wavelet analysis 
El . A major advantage of wavelet methods is their ability 
to simultaneously localize signal content in both space and 
frequency. This property allows them to compactly represent 
signals such as ID steps or images with edges, whose primary 
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information content lies in localized singularities. Moreover, 
wavelet methods represent such signals much more compactly 
than either the original domain or transforms with global basis 
elements such as the Fourier transform. We aim at extending 
the wavelet transform to irregular, non-Euclidean spaces, and 
thus obtain a transform that efficiently represent functions 
defined on such data. 

Several extensions of the wavelet transform, operating on 
graphs and high dimensional data, have already been proposed. 
The wavelet transforms proposed in (3), (H and were 
applied to the data points themselves, rather than functions de- 
fined on the data. Other methods took different approaches to 
construct wavelets applied to functions on the data. Maggioni 
and Coifman Q and Hammond et al. |7] proposed wavelets 
based on diffusion operators and the graph Laplacian 0, (9), 
respectively. Jansen et al. iflQll proposed three methods which 
were based on a variation of the lifting scheme full . lfT2ll . 
Gavish et al. Q assumed that the geometry and structure of 
the input data are captured in a hierarchical tree. Then, given 
such a tree, they built a data adaptive Haar-like orthonormal 
basis for the space of functions over the data set. This basis, 
can be seen as a generalization of the one proposed in 031 for 
binary trees. Our proposed method generalizes the algorithm 
in d, and can also construct data adaptive orthonormal non- 
Haar wavelets, given a data driven-hierarchical tree. 

We note that the wavelet transforms proposed in fT4l . and 
fT5lL which are defined on images, also share some similarities 
with our proposed algorithm. These methods employ pairing 
or reordering of wavelet coefficients in the decomposition 
schemes in order to adapt to their input images. In fact, the 
easy path wavelet transform proposed in fT5l . which only 
recently has come to our attention, employs a decomposition 
scheme which is very similar to ours. Constraining our algo- 
rithm to a regular data grid and using the same starting point 
and search neighborhood as the ones employed in [15], both 
algorithms essentially coincide. Nevertheless, our approach 
is more general, as it tackles the more abstract problem of 
devising a transform for point clouds or high-dimensional 
graph-data, whereas lfl"5l concentrates on images. 

In this paper we introduce a generalized tree-based wavelet 
transform (GTBWT), which is an extension of the transform 
introduced by Gavish et al Q . We first show that the transform 
of Gavish et al., when derived from a full binary tree, can 
be applied to a function over the data set using a modified 
version of the common ID Haar wavelet filtering and deci- 
mation scheme. In each level of this wavelet decomposition 
scheme, a permutation derived from the tree is applied to the 
approximation coefficients, before they are filtered. Then we 
show how this scheme can be extended to work with different 
wavelet filters, by modifying the way different levels of the 
tree are constructed. 



The construction of each coarse level of the tree involves 
finding a path which passes through the set of data points 
in the finer level. The points order in this path defines the 
permutation applied to the approximation coefficients of the 
finer level in the wavelet decomposition scheme. We propose 
a path constructed by starting from a random point, and then 
continue from each point to its nearest neighbor according 
to some distance measure, visiting each point only once. 
The corresponding permutation increases the regularity of the 
permuted approximation coefficients signal, and therefore it 
is more efficiently (sparsely) represented using the wavelet 
transform. 

Next we show that the proposed scheme is more effi- 
cient than both the common ID and 2D separable wavelet 
transforms in representing images. Finally, we explore the 
application of the proposed transform to image denoising, 
and show that combined with a proposed subimage averaging 
scheme, it achieves denoising results similar to the ones 
obtained with the K-SVD algorithm lfT6l . 

The paper is organized as follows: In Section II we describe 
how the Haar-like basis introduced in d is derived from a full 
binary tree representation of the data. We also describe how 
such a tree may be constructed. In Section III we introduce 
the generalized tree-based wavelet transform. We also explore 
the efficiency with which this transform represents an image. 
In Section IV we explore the application of our proposed 
algorithm to image denoising. We also describe the subimage 
averaging scheme and present some experimental results. We 
summarize the paper in Section V. 

II. Tree-based Haar wavelets 

Let X = {xi,...,Xjv} be the dataset we wish to analyze, 
where the samples x^E M n may be points in high dimension, 
or nodes in a weighted graph or network. Also, let / : X —> 
R be a scalar function defined on the dataset, and let V = 
{/ |/ : X —> R} be the space of all functions on the dataset. 
Here we use the following inner product with the space V: 



N 



(1) 



which is different from the one used by Gavish et al. [ 1 ], since 
it does not contain a normalizing factor before the sum. 

Gavish et al. assume that the geometry and structure of the 
data X are captured by one or several hierarchical trees. They 
do not insist on any specific construction method for these 
trees, but only that they will be balanced HI . Given such a tree, 
they construct a multiscale wavelet-like orthonormal basis for 
the space V. They start by showing that such a tree induces a 
multi-resolution analysis with an associated Haar-like wavelet. 

Let i = 1 , . . . , L denote the level in the tree, with £ = 1 
being the root and t = L being the lowest level, where each 
sample Xj is a single leaf. Also, let V £ denote the space of 
functions constant on all folders (subtrees) at level t, and let 
lx denote a constant function on X with the value 1. Then 
V 1 = Span^lx}, V L = V and by construction 



Task: Construct a complete full binary tree from 
the data X. 

Parameters: We are given the points {x.j}f =1 and 
the distance function w. 
Initialization: Set cf = Xj as the tree leaves. 
Main Iteration: Perform the following steps for i = 

L,...,2: 

• Construct a distance matrix W £ , where 

wfj =w(ci,Cj). 
. Set Vt = 0. 

• Group the points in level t in pairs by repeat- 
ing N/2 1+L ~ e times: 

- Choose a random point c £ J0 , j £ Ve, and 
updated = ViU{j }. 

- Pair c e jo with the point 
h = min^p, w e jQij . 

- Updated = VeU{ji}. 

• Place in a vector p £ the reordered point in- 
dices of Ve. 

• Construct the coarse level i - 1 from the finer 
level £ by replacing each pair cf and c] with 
the mean point 4|[cf + cj]. 

Output: The tree node points c £ and the vectors 
Pi containing the points order in each tree level. 



Cj lt where 



Algorithm 1 : Complete full binary tree construction from the data X. 



Now, let W £ (1 < i < L) be the orthogonal complement 
of V £ in V £+1 . Then, the space of all functions V can be 
decomposed as 

YL-l 



V = v l 



(3) 



V 1 c ... cV £ c V £+1 c ... c V L = V. 



(2) 



Before describing how the multiscale orthonormal basis is 
constructed given a hierarchical tree, we first describe how 
such a tree can be constructed from the data. Here we focus 
on the case of complete full binary trees and the corresponding 
orthonormal bases. For the case of more general trees and Haar 
like bases, the reader may refer to d. Let c £ denote the j-th 
point at level i of the tree, where o, 1 - = Xj , and let Vt and p£ 
denote a set and a vector containing point indices, respectively. 
Also, let w(',-) be a distance measure in R n , and let 
be a distance matrix associated with the ^-th level of the 
tree, where wjj = w(cf,cj). The distance function describes 
the first-order interaction between data-points, and therefore it 
should be chosen so as to capture some notion of similarity 
between them, which would be meaningful to the application 
at hand. A complete full binary tree can be constructed from 
the data according to Algorithm 1 . An example for such a tree 
is shown in Fig. [T] 

In the case of a full binary tree, the Haar-like basis con- 
structed from the tree is essentially the standard Haar basis 
which we denote {/0j}jLi- The adaptivity of the transform is 
manifested in the fact that this basis is used to represent a 
permuted version of the signal /, which is more efficiently 
represented by the Haar basis than / itself. The permutation 
is derived from the tree, and is dependent on the data X. Let 
p denote a vector of length TV, which contains the indices 
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Fig. 1: An illustration of a complete full binary tree. 



of the points Xj, in the order determined by the lowest 
level of the fully constructed tree. For example, for the tree 
in Fig. Hp = [1,3,4,8,2,5,6,7] T . Also let f p be the 
signal / permuted according to the vector p. The wavelet 
coefficients can be calculated by the inner products (f p : ipj), 
or by applying the ID wavelet filtering and decimation scheme 
with the Haar wavelet filters on f p . Similarly, the inverse 
transform is calculated by applying the inverse Haar transform 
on the wavelet coefficients, and reordering the produced vector 
so as to cancel the index ordering in p. We hereafter term the 
scheme described above as tree-based Haar wavelet transform, 
and we show next that it can be extended to operate with 
general wavelet filters. 



Task: Apply L - 1 levels of tree-based wavelet 
decomposition to the signal f . 
Parameters: We are given the signal f , the vectors 
{Pi}i=2, and the filters h and g . 
Initialization: Set az, = f . 

Main Iteration: Perform the following steps for £ = 

L,...,2: 

• Construct the operator P £ that reorders its 
input vector according to the indices in p^. 

• Apply Pi to at and receive af . 

• Filter a^ with h and decimate the result by 2 
to receive a^_i. 

• Filter a^ with g and decimate the result by 2 
to receive d^_i. 

Output: The approximation coefficients ai and 
detail coefficients {d^}^ 1 corresponding to f . 



Algorithm 2: Tree-based wavelet decomposition algorithm. 
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S — i 2 -> d e _ 2 



Fig. 2: Tree-based wavelet decomposition scheme 



III. Generalized tree-based wavelets 
A. Generalized tree construction and transform 

The above-described building process of the tree can be 
presented a little differently. In every level £ of the tree, we 
first construct a distance matrix W £ using the mean points 
Cj and the distance function w. Then we group in pairs the 
points Cj, according to the weights in W^, as described in 
Algorithm 1. Next we place the pairs of column vectors one 
after the other in a matrix of size n x N/2 L ~ £ , and keep 
the indices of the points in their new order in a vector pi of 
length N/2 L ~ £ . For example in level £ = 3 of the tree in Fig. 
IDC§ = [c?,c|,c|,ct]aiidp3 = [l,3,2,4] T . 

Now let h = ^ [1, 1] T and g = ^ [-1, 1] T be the Haar 

wavelet decomposition filters, and let h = -^[1,1] T and 

g = -4= [1,— 1] T be the Haar wavelet reconstruction filters. 
We notice that replacing each pair by its mean point can be 
done by filtering the rows of with the low pass filter h T , 
followed by decimation of the columns of the outcome by a 
factor of 2. For example, the points c^ and c\ in level £ = 2 
of the tree in Fig. [T] are obtained by filtering the rows of Cf 
described above with the filter h T , and keeping the first and 
third columns of the produced matrix. Effectively this means 
that the approximation coefficients corresponding to a single- 
level Haar decomposition are calculated for each row of the 
matrix C^. 

Next let f = [/(xi), . . . , /(xa/-)] t , and let and denote 
the approximation and detail coefficient vectors, respectively, 



received for f at level £, where a/, = f . Also, let Pg denote a 
linear operator that reorders a vector according to the indices 
in p£. Then, applying the Haar transform derived from the 
tree to f can be carried out according to the decomposition 
algorithm in Algorithm 2. Fig. [2 describes two single-level 
decomposition steps carried out according to Algorithm 2. We 
note that here the adaptivity of the transform is related to 
the permutations applied to the coefficients a^ in every level 
of the tree. In fact, without the operator Pg applied in each 
level, the decomposition scheme of Algorithm 2 reduces to that 
of the common ID orthogonal wavelet transform. Also, since 
permutation of points is a unitary transform, the described 
transform remains unitary. 

Finally, let the linear operator Pf 1 reorder a vector so as to 
cancel the ordering done by Pg. Then the inverse transform is 
carried out using the reconstruction algorithm in Algorithm 3. 
Fig. [3] describes two single-level reconstruction steps carried 
out according to Algorithm 3. 

We next wish to extend the scheme described above to work 
with general wavelet filters. This requires modifying the tree 
construction by replacing the Haar filters by different wavelet 
filters and changing the manner in which the points are 
ordered in each level of the tree. The ordering procedure used 
in Algorithm 1 fits only the case when Haar wavelets are used, 
and therefore needs to be replaced. 

We note that when filters other than Haar are used in the 
tree construction scheme described above, each point in the 
coarse level £ — 1 is calculated as a weighted mean of more 
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Task: Reconstruct the signal f based on a multi- 
level tree-based wavelet decomposition. 
Parameters: We are given the approximation and 
detail coefficients ai and {d^=i\ the vectors 
{Pi}i=2, and the filters h and g . 
Main Iteration: Perform the following steps for i = 
l,...,L-l: 

• Interpolate by a factor of 2 and filter the 
result with h. 

• Interpolate by a factor of 2 and filter the 
result with g. 

• Sum the results of the two previous steps to 
receive af +1 . 

• Construct the operator P^~\ that reorders its 
input vector so as to cancel the index ordering 
in p^+i. 

• Apply P^~\ to a^ +1 and receive a w . 
Output: The reconstructed signal f . 



Algorithm 3: Tree-based wavelet reconstruction algorithm. 
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Fig. 3: Tree-based wavelet reconstruction scheme 



than two points from the finer level £, where the coefficients 
in h serve as the weights. Therefore, the resultant graph is 
no longer a tree but rather a rooted L-partite graph, which 
is a graph that contains L disjoint sets of vertices so that no 
two vertices within the same set are adjacent. As the wavelet 
scheme described above was originally designed with the Haar 
wavelet filters, and in order to avoid cumbersome distinction 
between trees and L-partite graphs, with a small abuse of 
terminology we will hereafter refer to the latter also as trees. 
An example of such a "generalized" tree, is shown in Fig. [4] 
The wavelet decomposition and reconstruction schemes 
corresponding to each generalized tree are those described in 
Algorithms 2 and 3, with the necessary change of wavelet 
filters type and index vectors p£ to the ones used in the 
construction of the graph. We next propose a method to order 
the points in each level of the generalized trees. 

B. Smoothing &£ 

We wish to order the points in each a level of a tree in 
a manner which results in an efficient representation of the 
input signal by the tree-based wavelets. More specifically, we 
want the transformed signal to contain a small number of large 
coefficients, i.e. to be sparse. The wavelet transform is known 
to produce a small number of large coefficients when it is 
applied to piecewise regular signals Q. Thus, we would like 
the operator P^, applied to sl£, to produce a signal which is as 



£ = 3 



£ = 4 




*3 ^2 ^5 

Fig. 4: An illustration of a "generalized" tree. 



Task: Construct of a "generalized" tree from the 
data X. 

Parameters: We are given the points {x^-}^ and 
the weight function w . 
Initialization: Set = Xj as the tree leaves. 
Main Iteration: Perform the following steps for £ = 

L,...,2: 

• Construct a distance matrix W e , where 



• Choose a random point cjj and set Vt = 

{jo} - 

• Reorder the points c) so that they will form a 
smooth path by repeating N/2 L ~ e - 1 times: 

- Set ji — mhij£ Vg Wj j and update Ve — 
P*U{ji}. 

- Set j = ji. 

• Place in a vector p £ the reordered point in- 
dices of Vi. 

• Order the points according to the indices in 
Pi and place them in a matrix C p t . 

• Obtain the points c^ -1 by: 

- Apply the filter h T to the matrix Cf. 

- Decimate the columns of the outcome by a 
factor of 2. 

Output: The tree node points c] and the vectors 
Pi containing the points order in each tree level. 



Algorithm 4: Construction of a "generalized" tree from the data X. 



regular as possible. When the signal f is known, the optimal 
solution would be to apply a simple sort operation on the 
corresponding coefficients a.£, obtained in each level. However, 
since we are interested in the case where f is not necessarily 
known (such as in the case where f is noisy, or has missing 
values), we would try to find a suboptimal ordering operation 
in each level £, using the feature points Cj. We assume that 
under the distance measure w(-, •), proximity between the two 
points cf and suggests proximity between the coefficients 
CLi(i) and ae(j). Thus, we would try to reorder the points 
so that they will form a smooth path, hoping that the 
corresponding reordered ID signal will also be smooth. 

The "smoothness" of a ID signal y of length N can be 
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measured using its total variation 

N 

l|y|k = El^')-^'-i)|. (4) 

By analogy, we measure the "smoothness" of the path by 
the measure 

N 

Cl V = J2 W ( C P C U)' (5) 

We notice that smoothing the path comes down to finding 
the shortest path that passes through the set of points Cp 
visiting each point only once. This can be regarded as an 
instance of the traveling salesman problem ifTTlL which can 
become very computationally exhaustive for large sets of 
points. A simple approximate solution is to start from a 
random point, and then continue from each point to its nearest 
neighbor, not visiting any point twice. As it turns out, ordering 
the points Cj in this manner improves the performance of the 
resultant transform. 

A generalized tree construction, which employs the pro- 
posed point ordering method is summarized in Algorithm 4. 
Fig. [4] shows an example of "generalized" tree which may 
be obtained with Algorithm 4 using a filter h of length 4 
and disregarding boundary issues in the different levels. We 
term the filtering schemes described in Algorithms 2 and 3 
combined with the tree construction described in Algorithm 4 
generalized tree-based wavelet transform (GTBWT). 

An interesting question is whether the wavelet scheme 
described above represents the signal f more efficiently than 
the common ID and 2D separable wavelet transforms. Here, 
we measure efficiency by the m-term approximation error, i.e. 
the error obtained when representing a signal with m non-zero 
transform coefficients. 

Before relating to this question, we first describe how the 
wavelet scheme described above can be applied to images. Let 
F be a grayscale image of size y/N x and let f be its 
column stacked representation, i.e. f is a vector of length TV 
containing individual pixel intensities. Then we first need to 
extract the feature points x$ from the image, which will be 
later used to construct the tree. Let fa be the i-th sample in f , 
then we choose the point associated with it as the 9 x 9 patch 
around the location of fa in the image F. We next construct 
several trees, each with a different wavelet filter, according to 
the scheme described above. We choose the weight function 
w to be the squared Euclidean distance, i.e. the (i,j) element 
in is 

< = ||c!-c^|| 2 . (6) 

We use the transforms corresponding to these trees to obtain 
m-term approximations of a column stacked version of the 
128 x 128 image shown in Fig. |3a) (the center of the Lena 
image). The approximations, shown in Fig. \5\ were carried 
out by keeping the highest coefficients (in absolute value) in 
the different transform domains. We compare these results be- 
tween themselves, and to the m-term approximations obtained 
with the common ID and the 2D separable wavelet transforms 
(the latter applied to the original image) corresponding to 
the same wavelet filters. Fig. [5] shows m-term approximation 



results (PSNR) obtained for the image in Fig. [T^a) using 
different wavelet filters. It can be seen that the generalized tree- 
based wavelet transform outperforms the two other transforms 
for all the wavelet filters that have been used. It can also be 
seen that the PSNR gap in the first thousands of coefficients 
increases with number of vanishing moments of the wavelet 
filters. Further, Figure [3d) shows that the generalized tree- 
based wavelet results obtained with the db4 and db8 filters 
are close, and better than the ones obtained with the Haar 
filter. 

Before concluding this section, we are interested to see how 
the basis elements of the proposed wavelet transform look like. 
Each basis element is obtained by setting the corresponding 
transform coefficient to 1 and all the other coefficients to zero, 
and applying the inverse transform. Unlike the case of common 
ID wavelet bases, the series of data derived permutations 
applied to the reconstructed coefficients in the different stages 
of the inverse transform, result in basis functions which adapt 
themselves to the input signal /. 

As a first example, we examine the basis elements obtained 
for the synthetic image of size 64 x 64, shown in Fig 0a). 
The image contains in its middle a square rotated in an angle 
of 45 degrees. We visualize the basis elements as images by 
reshaping them to the size of the input image. Figs 0b)- 
(n) show some of the basis elements corresponding to the 
synthetic image, and obtained with the Symmlet 8 filter. The 
figures show two scaling functions from level £ = 1, and 
two wavelet basis functions from each level £ = 1, . . . , 12, 
all corresponding to the two largest coefficients in the same 
level. We note that the reason we have more than one scaling 
function and one wavelet basis element in coarsest level £ = 1 
is related to our implementation of the transform. We use 
symmetric padding in the signal boundaries before applying 
it the wavelet filters, which slightly increases the number of 
coefficients (and corresponding basis elements) obtained in 
each level of the tree. It can be seen how the scaling functions 
and wavelets in the low levels of the tree (£ = 1 , . . . , 4) 
represent the low frequency information in the image, i.e. 
smooth versions of the rectangle. It can also be seen that the 
wavelet functions represent finer edges as the level of the tree 
increases, and that they successfully manage to capture edges 
which are not aligned with the vertical and horizontal axes. 

We next examine the wavelet basis elements corresponding 
to the image in Fig. 13a), obtained with the Symmlet 8 filter. 
Figs. I3b)-(p) show two scaling functions from level £ = 1, 
and two wavelet basis functions from each level £— 1, . . . , 14, 
all corresponding to the two largest coefficients in the same 
level. It can be seen how the basis functions adapt to the shapes 
in the images. Here again the scaling functions and wavelets 
in the low levels of the tree (£ = 1 , . . . , 4) represent the low 
frequency information in the image, and the wavelet functions 
represent finer edges in the image as the level of the tree 
increases. We next present the application of the generalized 
tree-based wavelet transform to image denoising. 



6 



generalized tree-based (solid) 

common 1 D (dotted) 

2D separable (dashed) 



4000 6000 
# Coefficients 



(a) 



-generalized tree-based (solid) 
■ common 1 D (dotted) 
-2D separable (dashed) 



4000 6000 
# Coefficients 



(c) 



generalized tree-based (solid) 




common 1 D (dotted) 




2D separable (dashed) 
















/ / 




/ * 
t 




// 
i- 




t 

i 
i 





4000 6000 
# Coefficients 




1 500 2000 2500 
# Coefficients 



(d) 



Fig. 5: ra-term approximation results (PSNR in dB) for the generalized tree-based, common ID and 2D separable wavelet transforms, 
obtained with different wavelet filters: (a) Daubechies 1 (Haar). (b) Daubechies 4. (c) Daubechies 8. (d) Comparison between the generalized 
tree-based wavelet results obtained with the different filters. 



IV. Image Denoising using the Generalized 
tree-Based Wavelet Transform 

A. The Basics 

Let F be an image of size V^V x yfN, and let F be its 
noisy version: 

F = F + Z. (7) 

Z is a matrix of size x \fN which denotes an additive 
white Gaussian noise independent of F with zero mean and 
variance a 2 . Also, let f and f be the column stacked represen- 
tations of F and F, respectively. Our goal is to reconstruct f 
from f using the generalized tree-based wavelet transform. To 
this end, we first extract the feature points from F similarly 
to the way they were extracted from F in the previous section. 
Let fi be the i-th sample in f , then here we choose the point 
associated with it as the 9 x 9 patch around the location 
of fi in the image F. We note that since different features are 
used for the clear and noisy images, the corresponding trees 
derived from these images will also be different. Nevertheless, 
it was shown in fT8lh fT9lh l20l that the distance between the 
noisy patches x^ and Xj is a good predictor for the similarity 
between the clear versions of their middle pixels fi and fj. 
This means that the tree construction is roughly robust to 
noise, and that the obtained tree will capture the geometry 



and structure of the clear image quite well. We will further 
discuss the noise robustness of the algorithm in Section IIV-CI 

We next construct the tree according to the scheme de- 
scribed in Algorithm 4, where the dissimilarity between the 
points cf and in level i is again measured by the squared 
Euclidean distance between them, which can be found in the 
(i, j) location in W^. 

Similarly to Gavish et al. |T], we use an approach which 
resembles the "cycle spinning" method ifTTTl in order to 
smooth the image denoising outcome. This means that we 
randomly construct 10 different trees, utilize the transforms 
corresponding to each of them to denoise f , and average the 
produced images. Each level of a random tree is constructed 
by choosing the first point at random, and then continue from 
each point x JO to its nearest neighbor x JX with a probability 
pi oc exp(— ||xj — x jx || 2 /e), or to its second nearest neighbor 
x j2 with a probability p2 oc exp(— ||x JO — x j2 || 2 /e), where 
here we set e = 0.1. 

The denoising itself is performed by applying the proposed 
wavelet transform (derived from the tree of f ) on f , using hard 
thersholding on the transform coefficients, and computing the 
inverse transform. 

In order to assess the performance of the proposed denoising 
algorithm we first apply it, using different wavelet filters, to a 




Fig. 6: Generalized tree-based wavelet basis elements derived from a synthetic image: (a) the original image, (b) scaling functions (£=l). (c) 
wavelets (£=1). (d) wavelets (£=2). (e) wavelets (£=3). (f) wavelets (£=4). (g) wavelets (£=5). (h) wavelet (£=6). (i) wavelet (£=7). (j) wavelet 
(£=8). (k) wavelets (£=9). (1) wavelets (£=10). (m) wavelets (£=11). (n) wavelets (£=12). 



noisy version of the image shown in Fig.Oa). For each of the 
transforms we perform 5 experiments for different realizations 
of noise with standard deviation a = 25 and average the 
results - these averages are given in Table [H We note that 
the denoising thresholds were manually found to produce 
good denoising results, as the theoretical wavelet threshold 
(jy/2 log e N led to poorer results, with PSNR values which 
were lower by about 0.6 dB. It can be seen that better results 
are obtained with the Symmlet wavelets, and that generally 
better results are obtained with wavelets with a high number of 
vanishing moments. It can also be seen that all the transforms 
require about 320 coefficients to represent the image (for each 
of the 10 results produced by the different trees) which is 
about 2 percents of the number of coefficients required in the 
original space. 

We next apply the proposed scheme with the Symmlet 8 
wavelet to noisy versions of the images Lena and Barbara, 
with noise standard deviation a = 25 and PSNR of 20.17 
dB. We note that this time we use patches of size 13 x 13 
and perform only one experiment for each image, for one 
realization of noise. The clear, noisy and recovered images 
can be seen in Fig. [8] For comparison reasons, we also apply 
to the two images the denoising scheme of Elad et al. fT6lL 
which utilize the K-SVD algorithm l22l . We chose to compare 
our results to the ones obtained by this scheme, as it also 
employs an efficient representation of the image content, and is 
one of many state-of-the-art image denoising algorithms l23ll . 
GU, which are based on sparse and redundant representations 
ll25ll . However, we note that this scheme is based on efficient 
representations of the image patches, while our scheme is 
based on efficient representation of the whole image. The 
PSNR of the results obtained with the proposed scheme and 



the K-SVD algorithm are shown in Table HH It can be seen that 
the results obtained by our algorithm are inferior compared to 
the ones obtained with the K-SVD. We next try to improve 
the results produced by our proposed scheme by adding an 
averaging element into it, which is also a variation on the 
"cycle spinning" method. 

B. Subimage averaging 

Let X p be an n x (y/N — y/n + l) 2 matrix, containing 
column stacked versions of all the y/n x y/n patches inside 
the image. When we built a tree for the image, we assumed that 
each patch is associated only with its middle pixel. Therefore 
the tree was associated with the signal composed of the middle 
points in the patches, which is the middle row of X p , and 
the transform was applied to this signal. However, we can 
alternatively choose to associate all the patches with a pixel 
located in a different position, for example the top right 
pixel in each patch. Since the whole patches are used in the 
construction of the tree, effectively this means that the tree can 
be associated with any one of the N p signals located in the 
rows of X p . These signals are the column stacked versions of 
the n subimages of size (y/N — y/n + 1) x (\/N — y/n + 1), 
whose top right pixel reside in the top right y/n x y/n patch in 
the image. We next apply the generalized tree-based wavelet 
transform denoising scheme to each of these n signals. We 
then plug each subimage into its original place in the image, 
and average the different values obtained for each pixel, 
similarly to the way an image is reconstructed from its patches 
in the K-SVD image denoising scheme fT6l . We hereafter refer 
to scheme described above as Subimage Averaging (SA). 

The subimage averaging scheme can also be viewed a little 
differently. The tree also defines a wavelet transform on the 
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(m) (n) (o) (p) 

Fig. 7: Generalized tree-based wavelet basis elements derived from an image: (a) the original image, (b) scaling functions (£=1). (c) wavelets 
(£=1). (d) wavelets (£=2). (e) wavelets (£=3). (f) wavelets (£=4). (g) wavelets (£=5). (h) wavelets (£=6). (i) wavelets (£=7). (j) wavelets (£=8). 
(k) wavelets (£=9). (1) wavelets (£=10). (m) wavelets (£=11). (n) wavelets (£=12). (o) wavelets (£=13). (p) wavelets (£=14). 



data points Xj , where the approximation coefficient vectors in 
level £ are the points c j . We denote detail coefficients vectors 
in level £ as . These vectors can be calculated by applying 
the filter g T to the matrix C^ +1 and decimating the columns of 
the outcome by a factor of 2. A matrix Q containing and all 
the detail coefficient vectors {q^}f~^ can also be obtained by 
applying the generalized tree-based wavelet transform to each 
row of the matrix X p . Similarly, the inverse transform can 
be performed on Q by applying each row of this matrix the 
generalized tree-based wavelet inverse transform. Therefore 
the subimage averaging scheme can be viewed as applying the 
transform derived from the tree directly on the image pathes, 



performing hard thresholding on them, applying the inverse 
transform to the coefficient vectors, and reconstructing the 
image from the clean patches. 

The results obtained with the generalized tree-based wavelet 
transform, combined with the subimage averaging procedure, 
for the noisy version of the image in Fig. Oa) are shown in 
Table d It can be seen that this procedure increases the PSNR 
of the results by about 0.85 dB. We note that since we use 
9x9 patches, each of the 10 images averaged in the cycle 
spinning procedure is reconstructed from 81 subimages, and it 
can be seen that on average about 360 transform coefficients 
are required to represent each one of these subimages. Thus 
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TABLE I: Denoising results of a noisy version of the image in Fig. 
Ha) (a = 25, input PSNR=20.18 dB), obtained using the GTBWT, 
with and without subimage averaging (SA), and with different 
wavelet filters. Also shown is the average number of coefficients 
(#coeffs.) used by each scheme to represent an image. 





GTBWT 


with SA 


wavelet type 


PSNR [dB] 


#coeffs 


PSNR [dB] 


#coeffs 


dbl (Haar) 


28.19 


303.28 


29.05 


349.02 


db4 


28.5 


307.62 


29.37 


341.25 


db8 


28.49 


384.08 


29.33 


403.2 


sym2 


28.45 


293.84 


29.25 


326.48 


sym4 


28.51 


308.6 


29.38 


349.81 


sym8 


28.55 


342.76 


29.39 


392.77 



TABLE II: Denoising results (PSNR in dB) of noisy versions of the 
images Barbara and Lena (a = 25, input PSNR=20.17 dB) obtained 
with: 1) GTBWT with a Symmlet 8 filter with subimage averaging 
(SA) and without it. 2) with the K-SVD algorithm. 



Image 


GTBWT 


GTBWT + SA 


K-SVD 


Lena 


30.3 


31.21 


31.32 


Barbara 


28.94 


29.82 


29.62 



the number of transform coefficients required to represent each 
subimage is slightly higher than the one required to represent 
an image when the subimage averaging is not used, but is still 
much lower than the number required in the original space. 

We also applied the proposed denoising scheme to the 
noisy Lena and Barbara images. The reconstructed images 
are shown in Figs. Od) and (h), and Table HJ shows the 
corresponding PSNR results. It can be seen that the results 
obtained with the proposed algorithm for the Lena image are 
now closer to the ones received by the K-SVD algorithm, and 
the results obtained for the Barbara image are better than the 
ones obtained with the K-SVD algorithm. 

We note that only a subset of design parameters has been 
explored when we applied the proposed image denoising 
algorithm. These design parameters include the patch size, the 
wavelet filter type (we have not explored biorthogonal wavelet 
filters at all), and the number of trees averaged by the cycle 
spinning method. They also include the number of nearest 
neighbors to choose from in the construction process of each 
level of these trees, and the parameter e used to determine the 
probabilities of choosing them. We believe that better results 
may be achieved with the adequate design parameters, and 
therefore more tests need to performed in order to search 
for the parameters which optimize the performance of the 
algorithm. We also note that the use of the proposed sub- 
image averaging scheme is not restricted to the generalized 
tree-based wavelet transform. This scheme may be used to 
improve the denoising results obtained with different methods 
which use distance matrices similar to the one employed here. 

C. Robustness to noise 

We wish to explore the robustness of the generalized tree- 
based wavelet denoising scheme to the noise in the points Xj . 
More specifically we wish to check how using cleaner image 
patches will affect the denoising results. Here we obtained 
"clean" patches from a noisy image by applying our denoising 



TABLE III: Denoising results (PSNR in dB) of noisy versions of 
the images Barbara and Lena (a = 25, input PSNR=20.17 dB) 
obtained using the GTBWT with subimage averaging. The trees 
were constructed using patches obtained from the noisy (1 iter), 
original (clean) and reconstructed (2 iters) images. 



Image 


1 iter 


2 iters 


clean 


Lena 


31.21 


31.34 


32.25 


Barbara 


29.62 


29.71 


30.61 



scheme once, and using the patches from the reconstructed 
image in a second iteration of the denoising algorithm for 
defining the permutations. For comparison reasons we also 
used clean patches from the original clean image in our 
denoising scheme, and regarded the obtained results as oracle 
estimates. 

We applied the GTBWT denoising schemes, obtained with 
patches from the clean and reconstructed images, to the noisy 
Barbara and Lena images. The results obtained with the 
Symmlet 8 filter and with subimage averaging are shown in 
Table [TTT1 It can be seen that large improvements of about 
1 dB for the Lena image and 0.8 dB for the Barbara image 
are obtained with clean patches. Applying 2 iterations of the 
denoising algorithm slightly improves the results of the Lena 
image by about 0.13 dB. However, applying 2 iterations of 
the denoising algorithm to the Barbara image degrades the 
obtained results by about 0.11 dB. This degradation probably 
results from oversmoothing of the patches of the reconstructed 
image, which leads to a loss of details. A choice of a different 
threshold in the denoising algorithm applied in first iteration, 
or a different method to clean the patches altogether, may lead 
to improved denoising results. 

V. CONCLUSION 

We have proposed a new wavelet transform applicable to 
graphs and high dimensional data. This transform is an exten- 
sion of the multiscale harmonic analysis approach proposed 
by Gavish et al. IB. We have shown a relation between the 
transform suggested by Gavish et al. and ID Haar wavelet 
filtering, and extended the scheme so it will use general 
wavelet filters. We demonstrated the ability of the generalized 
scheme to represent images more efficiently than the common 
ID and separable 2D wavelet transforms. We have also shown 
that our proposed scheme can be used for image denoising, and 
that combined with a subimage averaging scheme it achieves 
denoising results which are close to the state-of-the-art. 

In our future work plans, we intend to consider the following 
issues: 

1) Seek ways to improve the method that reorders the 
approximation coefficients in each level of the tree, 
replacing the proposed nearest neighbor method. 

2) Extend this work to redundant wavelets. 

3) Improve the image denoising results by using two iter- 
ations with different threshold settings, and by consid- 
ering spatial proximity as well in the tree construction. 
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Fig. 8: Denoising results of noisy versions of the images Barbara and Lena (a = 25, input PSNR=20.17 dB) obtained with GTBWT with 
a Symmlet 8 filter with subimage averaging (SA) and without it: (a) Original Lena, (b) Noisy Lena (20.17 dB). (c) Lena denoised using 
GTBWT (30.3 dB). (d) Lena denoised using GTBWT with sub image averaging (31.21 dB) (e) Original Barbara, (f) Noisy Barbara (20.17 
dB). (g) Barbara denoised using GTBWT (28.94 dB). (h) Barbara denoised using GTBWT with sub image averaging (29.82 dB). 



work. The authors also thank the anonymous reviewers for 
their helpful comments. 

References 

[1] M. Gavish, B. Nadler, and R. R. Coifman, "Multiscale wavelets on 
trees, graphs and high dimensional data: Theory and applications to 
semi supervised learning," in Proceedings of the 27th International 
Conference on machine Learning, 2010. 

[2] S. Mallat, A Wavelet Tour of Signal Processing, The Sparse Way. 
Academic Press, 2009. 

[3] F. Murtagh, "The Haar wavelet transform of a dendrogram," Journal of 
Classification, vol. 24, no. 1, pp. 3-32, 2007. 

[4] A. B. Lee, B. Nadler, and L. Wasserman, "Treelets-An adaptive multi- 
scale basis for sparse unordered data," Annals, vol. 2, no. 2, pp. 435^71, 
2008. 

[5] G. Chen and M. Maggioni, "Multiscale geometric wavelets for the 
analysis of point clouds," in Information Sciences and Systems (C1SS), 
2010 44th Annual Conference on. IEEE, 2010, pp. 1-6. 

[6] R. R. Coifman and M. Maggioni, "Diffusion wavelets," Applied and 
Computational Harmonic Analysis, vol. 21, no. 1, pp. 53-94, 2006. 

[7] D. K. Hammond, R Vandergheynst, and R. Gribonval, "Wavelets on 
graphs via spectral graph theory," Applied and Computational Harmonic 
Analysis, 2010. 

[8] F R. K. Chung, "Spectral graph theory," vol. 92, CBMS Regional 

Conference Series in Mathematics. AMS Bookstore, 1997. 
[9] U. Von Luxburg, "A tutorial on spectral clustering," Statistics and 

Computing, vol. 17, no. 4, pp. 395-416, 2007. 
[10] M. Jansen, G. R Nason, and B. W. Silverman, "Multiscale methods for 

data on graphs and irregular multidimensional situations," Journal of the 

Royal Statistical Society: Series B (Statistical Methodology), vol. 71, 

no. 1, pp. 97-125, 2009. 
[11] W. Sweldens, "The lifting scheme: A custom-design construction of 

biorthogonal wavelets," Applied and Computational Harmonic Analysis, 

vol. 3, no. 2, pp. 186-200, 1996. 
[12] , "The lifting scheme: A construction of second generation 

wavelets," SIAM Journal on Mathematical Analysis, vol. 29, no. 2, p. 

511, 1998. 

[13] K. Egiazarian and J. Astola, "Tree- structured Haar transforms," Journal 
of Mathematical Imaging and Vision, vol. 16, no. 3, pp. 269-279, 2002. 



[14] S. Mallat, "Geometrical grouplets," Applied and Computational Har- 
monic Analysis, vol. 26, no. 2, pp. 161-180, 2009. 

[15] G. Plonka, "The Easy Path Wavelet Transform: A New Adaptive Wavelet 
Transform for Sparse Representation of Two-Dimensional Data," Mul- 
tiscale Modeling & Simulation, vol. 7, p. 1474, 2009. 

[16] M. Elad and M. Aharon, "Image denoising via sparse and redundant 
representations over learned dictionaries," IEEE Transactions on Image 
Processing, vol. 15, no. 12, pp. 3736-3745, 2006. 

[17] T. H. Cormen, Introduction to algorithms. The MIT press, 2001. 

[18] A. Buades, B. Coll, and J. M. Morel, "A review of image denoising 
algorithms, with a new one," Multiscale Modeling and Simulation, vol. 4, 
no. 2, pp. 490-530, 2006. 

[19] A. D. Szlam, M. Maggioni, and R. R. Coifman, "Regularization on 
graphs with function-adapted diffusion processes," The Journal of Ma- 
chine Learning Research, vol. 9, pp. 1711-1739, 2008. 

[20] A. Singer, Y. Shkolnisky, and B. Nadler, "Diffusion interpretation of 
non-local neighborhood filters for signal denoising," SIAM J. on Imag. 
Sci, vol. 2, no. 1, pp. 118-139, 2009. 

[21] R. R. Coifman and D. L. Donoho, Wavelets and Statistics. Springer- 
Verlag, 1995, ch. Translation-invariant de-noising, pp. 125-150. 

[22] M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: An algorithm for 
designing overcomplete dictionaries for sparse representation," IEEE 
Transactions on signal processing, vol. 54, no. 11, p. 4311, 2006. 

[23] J. Mairal, M. Elad, G. Sapiro et al, "Sparse representation for color 
image restoration," IEEE Transactions on Image Processing, vol. 17, 
no. 1, p. 53, 2008. 

[24] J. Mairal, F Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Non-local 
sparse models for image restoration," in Computer Vision, 2009 IEEE 
12th International Conference on. IEEE, 2010, pp. 2272-2279. 

[25] A. M. Bruckstein, D. L. Donoho, and M. Elad, "From sparse solutions of 
systems of equations to sparse modeling of signals and images," SIAM 
review, vol. 51, no. 1, pp. 34-81, 2009. 



